j Ww | M I" M ¢€ Metabarcoding and Metagenomics 4: 47-64
DOI 10.3897/mbmg.4.53455
Metabarcoding & Metagenomics :
Research Article 8
Increasing confidence for discerning species and population
compositions from metabarcoding assays of environmental
samples: case studies of fishes in the Laurentian Great Lakes and
Wabash River
Matthew R. Snyder':*°, Carol A. Stepien?
1 Genetics and Genomics Group, Department of Environmental Sciences, University of Toledo, (Lab group led by CAS relocated to 2 and 3 in
2016), Toledo, OH, 43606, USA
2 Genetics and Genomics Group, Pacific Marine Environmental Laboratory, National Oceanic and Atmospheric Administration, 7600 Sand Point
Way NE, Seattle, WA, 98115, USA
3 Genetics and Genomics Group, Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle WA, 98195, USA
Corresponding author: Carol A. Stepien (carol.stepien@noaa.gov; cstepien@uw.edu)
Academic editor: Bernd Hanfling | Received 21 April 2020 | Accepted 11 August 2020 | Published 26 August 2020
Abstract
Community composition data are essential for conservation management, facilitating identification of rare native and invasive
species, along with abundant ones. However, traditional capture-based morphological surveys require considerable taxonomic ex-
pertise, are time consuming and expensive, can kill rare taxa and damage habitats, and often are prone to false negatives. Alterna-
tively, metabarcoding assays can be used to assess the genetic identity and compositions of entire communities from environmental
samples, comprising a more sensitive, less damaging, and relatively time- and cost-efficient approach. However, there is a trade-off
between the stringency of bioinformatic filtering needed to remove false positives and the potential for false negatives. The present
investigation thus evaluated use of four mitochondrial (mt) DNA metabarcoding assays and a customized bioinformatic pipeline
to increase confidence in species identifications by removing false positives, while achieving high detection probability. Positive
controls were used to calculate sequencing error, and results that fell below those cutoff values were removed, unless found with
multiple assays. The performance of this approach was tested to discern and identify North American freshwater fishes using lab ex-
periments (mock communities and aquarium experiments) and processing of a bulk ichthyoplankton sample. The method then was
applied to field environmental (e) DNA water samples taken concomitant with electrofishing surveys and morphological identifica-
tions. This protocol detected 100% of species present in concomitant electrofishing surveys in the Wabash River and an additional
21 that were absent from traditional sampling. Using single 1 L water samples collected from just four locations, the metabarcoding
assays discerned 73% of the total fish species that were discerned during four months of an extensive electrofishing river survey in
the Maumee River, along with an additional nine species. In both rivers, total fish species diversity was best resolved when all four
metabarcoding assays were used together, which identified 35 additional species missed by electrofishing. Ecological distinction
and diversity levels among the fish communities also were better resolved with the metabarcoding assays than with morphological
sampling and identifications, especially using all four assays together. At the population-level, metabarcoding analyses targeting the
invasive round goby Neogobius melanostomus and the silver carp Hypophthalmichthys molitrix identified all population haplotype
variants found using Sanger sequencing of morphologically sampled fish, along with additional intra-specific diversity, meriting
further investigation. Overall findings demonstrated that the use of multiple metabarcoding assays and custom bioinformatics that
filter potential error from true positive detections improves confidence in evaluating biodiversity.
Key Words
community composition, cytochrome b, eDNA, Great Lakes, population variation, species detection, species diversity, 12S RNA
Copyright Matthew R. Snyder, Carol A. Stepien. This is an open access article distributed under the terms of the Creative Commons > PENSUFT.
—_—_— ®
Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
48 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
Introduction
Assessments of species compositions and diversities of
biological communities are fundamental for understand-
ing their ecology (Elton 1966; Begon et al. 2006; Morin
2009), facilitating conservation efforts (Myers et al. 2000;
Margules et al. 2002) and evaluating anthropogenic im-
pacts (Attrill and Depledge 1997). Identification of rare
and/or endangered species is of importance to fishery and
conservation managers (Dobson et al. 1997; Margules
et al. 2002), along with detection of non-native species
(Allendorf and Lundquist 2003). However, capture-based
surveys with morphological identifications are costly
to conduct, require extensive taxonomic expertise, and
are prone to false negatives (Attrill and Depledge 1997;
Balmford and Gaston 1999; Darling and Mahon 2011).
Metabarcoding assays employing high-throughput se-
quencing (HTS) can be used for species identifications
and calculations of community diversity, and are more
sensitive, less damaging, and relatively time- and cost-ef-
ficient than are morphological determinations from cap-
ture-based surveys (Smart et al. 2016; Deiner et al. 2017).
Some of the prior studies that have compared numbers
and biomass determinations of captured taxa to the rela-
tive proportions of sequence reads returned from metabar-
coding assays of water environmental (e) DNA samples
found positive correlations (Hanfling et al. 2016; Thom-
sen et al. 2016; Marshall and Stepien 2019), whereas oth-
ers did not (Shaw et al. 2016; Gillet et al. 2018). Biases
due to the degree of match between primers and target se-
quences can significantly affect these relationships and/or
inhibit species detections (Xiong et al. 2016; Alberdi et al.
2017; Kelly et al. 2017). Some general markers have used
less variable gene regions such as mitochondrial (mt) 12S
RNA to facilitate better match between primers and target
sequences, which often limit resolution to the genus level
or higher (Miya et al. 2015; Valentini et al. 2016; Cilleros
et al. 2019). Metabarcoding results also have been used to
evaluate population genetic information, employing spe-
cifically-designed, targeted markers to amplify sequence
regions containing intra-specific haplotypes (Sigsgaard et
al. 2017; Parsons et al. 2018; Marshall and Stepien 2019;
Stepien et al. 2019).
Potential error sources in environmental metabarcod-
ing assays
PCR inhibition is a challenge in some environmental
samples, leading to amplification failure or false nega-
tives (Civade et al. 2016; Fujii et al. 2019). Error from
incorrect base calls and/or sequence-to-sample mis-as-
signments due to index-hopping (when the wrong index
is incorporated into a HTS library) can result in false pos-
itives (Xiong et al. 2016; Deiner et al. 2017). Sequencing
error and index-hopping particularly are problematic in
discerning invasive or rare species, since a false positive
can lead to wasted effort and funds, including unneces-
sary response by management agencies to verify presence
https://mbmg.pensoft.net
(Zaiko et al. 2018). More stringent bioinformatic filtering
can remove some of this error but may lower detection
capability, particularly for rare taxa. Better protocols are
needed to alleviate primer bias and error in assay data,
while correctly identifying as many taxa as possible
(Zinger et al. 2019). Inaccurate base calls in HTS can ar-
tificially inflate population diversity estimated from in-
tra-specific haplotypic diversity (Tsuji et al. 2018), which
may pose difficulty in distinguishing signal (correct hap-
lotypes) from noise (“false” haplotypes).
Objectives
Our research objectives were to: (1) test the use of multi-
ple metabarcoding assays and an associated bioinformatic
pipeline, which combined results from primer sets to re-
duce possible sources of error and increase confidence, and
(2) evaluate the efficiency and accuracy of this approach
in field and laboratory experiments. For (2), we compared
the results with those from traditional capture-based field
sampling of fishes, morphological identifications, and
population genetic Sanger sequencing of individuals.
We tested the performance of our metabarcoding as-
says and bioinformatic pipeline with mock communities,
laboratory aquarium experiments, and processing of an
ichthyoplankton sample to assess sensitivity for assess-
ing inter- and intra-specific diversity (Suppl. material 1).
We applied this metabarcoding protocol to eDNA water
samples from two large rivers (Figs 1, 2), one in the Mis-
sissippi River system (the Wabash River; Experiment A1)
and the other in the Great Lakes’ watershed (the Maumee
River; Experiment A2). These were taken concomitant
with electrofishing surveys and with de novo sequencing
of fish community eDNA from two Great Lakes’ sites,
in Lake St. Clair and Lake Erie (Experiments A3-4:
Figs 1, 2). In addition, we conducted further field exper-
iments to assess the ability of metabarcoding assays to
discern population (haplotypic) genetic diversity (Exper-
iment Series B: Figs 1, 2).
Methods
Ethics statement and fish sampling
All fishes were collected by our lab under Ohio Depart-
ment of Natural Resources (ODNR) permit #17-159,
Michigan Department of Natural Resources permits, or
by collaborators with their permits (see Acknowledge-
ments). All native fishes except those used for the mock
communities (Suppl. material 1) were released alive and
in apparent good health immediately in the sampling
area. Invasive fishes (which cannot be legally re-re-
leased) were anesthetized and sacrificed under the ap-
proved University of Toledo IACUC #205400, “Genetic
studies for fishery management” (to CAS and laboratory
members) using an overdose of 250mg/L tricaine meth-
ane sulfonate (MS222; Argent Chemical Laboratories,
Metabarcoding and Metagenomics 4: e53455
A: Community diversity
A1. Wabash River
© = Four electrofishing
transects
= 1L water sampled before
& after each
e. All assays
A2. Maumee River
© = Four electrofishing transects
= 1L water sampled before each
« 40 additional electrofishing
transects compared
= All assays
A3 & 4. de novo sequencing
@ = A3: Lake St. Clair
= A4: Lake Erie Islands
= Duplicate 1L water samples
= de novo assessment of
community structure
= All assays
49
Test type
®@ Sensitivity
Semi-quantitative
@ Environmental sampling:
morphology vs. metabarcode
@ Comparisons of community
structure
@ Intra population diversity
Figure 1. Experimental design schematic, depicting Experiment Series A and B, brief methods summary for each experiment in the
Series, the aspect of metabarcoding assays tested, and assays applied.
* silver carp haplotypic diversity was assessed by Stepien et al. (2019).
Redmond, WA). Taxonomy and nomenclature presented
followed www.Fishbase.org.
Metabarcoding assays employed
Three metabarcoding assays designed by our lab (Stepien
et al. 2019, Snyder et al. 2020) were used, which target-
ed the mtDNA cytochrome (cyt) 5 gene to identify and
differentiate fish species (native and introduced/invasive)
from the Great Lakes and upper Wabash River regions
(Table 1, Suppl. material 1: Fig. S1). The FishCytb as-
say was a general assay designed to detect all fishes in
these ecosystems (Snyder et al. 2020), which amplified
a 154 nucleotide (NT) region of the cyt b gene, begin-
ning at NT 855. The CarpCytb assay was formulated for
invasive carps (Stepien et al. 2019), amplifying 136 NTs
beginning at NT 114, and the GobyCytb assay (Snyder
et al. 2020) was designed to distinguish invasive gobies,
amplifying 167 NTs beginning at NT 42. Another general
fish assay for part of the mt 12S RNA gene (MiFish; Miya
et al. 2015) also was used, for comparison (Table 1). The
CarpCytb and GobyCytb assays also were designed to
detect much of the known population genetic haplotypic
diversity across the target taxon group’s respective inva-
sive North American range.
All primer sets included the Illumina sequencing
adapters and four unique spacer inserts, designated e—h,
at the 5’ end (Table 1; Klymus et al. 2017). Spacer inserts
varied from 7—14 NTs to offset sequences and increase
library diversity, thereby improving the quality of HTS
data on the Illumina platform (Fadrosh et al. 2014; Wu et
al. 2015). The assays and their associated bioinformatic
pipeline were tested on mock communities (Experiment
Series Cl), in which samples containing genomic ex-
tractions of taxa having known sequences for the respec-
tive markers were mixed in a factorial design of varying
https://mbmg.pensoft.net
50 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
-85 -84 -83
LSC 1,273"
O
L. St.
Clair
&
Michigan ve
DRL*
L. Erie LE|
O
Indiana - | 4%
wabas” ~
WAB 1*,2*,3'
O
032 MAP
Figure 2. Map showing sample sites in the Wabash River (WAB), Maumee River (MAU), Detroit River (larval fish sample; DRL),
Lake St. Clair (LSC), and Lake Erie Islands (LEI) (for Experiment Series A and B). At selected sites, morphological surveys (*)
or traditional population genetics sampling and data collection (+) were conducted and compared to eDNA metabarcoding assay
results. Wabash River (WAB) and Lake St. Clair (LSC) locations were in too close proximity to be depicted separately (geographic
coordinates are in Suppl. material 1: Table S1). Field locations were mapped using STEPMAP (stepmap.com, which holds no copy-
right on data or layers presented).
Table 1. Primers used for our metabarcoding assays. Table indicates primer element function, primer name, direction (Direction
(Dir); F=forward, R=reverse), and sequences for each primer element. Length of region amplified (NTs; variable for 12S RNA
MiFish, for which a mean is given) and annealing temperatures (7,,) are provided for target-specific primers. Primer topology was
5*—Illumina sequencing adapter, spacer insert, target specific primer—3°. Spacer inserts were from Klymus et al. (2017). Assay pub-
lications: CarpCytb (Stepien et al. 2019), GobyCytb and FishCytb (Snyder et al. 2020), and 12S RNA MiFish (Miya et al. 2015).
Function Name Dir Sequence 5’-3’ NTs T,
Target specific FishCytb F GCCTACGCYATY CTHCGMTCHATYCC 154 50°C
R GGGTGTTCNACNGGYATNCCNCCAATTCA
CarpCytb F KRTGAAAYTTYGGMTCYCTHCTAGG 136 54°C
R AARAAGAATGATGCYCCRTTRGC
GobyCytb F AACVCAYCCVCTVCTWAAAATYGC 167 50°C
R AGTCANCCRAARTTWACRTCWCGRC
MiFish F GTCGGTAAAACTCGTGCCAGC ~172 65°C
R CATAGTGGGGTATCTAATCCCAGTTTG
Adapter Tlumina F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
seq R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
Spacer inserts e F TCCTATG
R CGTACTAGATGTACGA
f F ATGCTACAGT
R TCACTAGCTGACGC
g F CGAGGCTACAACTC
R GAGTAGCTGA
h F GATACGATCTCGCACTC
R ATCGGCT
https://mbmg.pensoft.net
Metabarcoding and Metagenomics 4: e53455
dilutions (per Klymus et al. 2017; Marshall and Stepien
2019). We also conducted aquarium experiments (Experi-
ment Series C2), which tested detection of both inter- and
intra-specific variation, and evaluated a bulk ichthyo-
plankton sample from which species had been identified
using morphological characters and microscopy (Exper-
iment C3). (These are detailed in Suppl. material 1: Ex-
periments C1—C3).
Experiment Series A: Species compositions and com-
munity diversity
To compare among the assays and with results from tra-
ditional morphological identifications of capture-based
samples, several eDNA water samples were collected
concomitant with conventional surveys (Figs 1, 2, Suppl.
material 1: Table S1). Samples were taken in bleach-ster-
ilized 1 L bottles 10 cm below the water surface, stored
on ice in brief transit from the field, and then frozen at
-80 °C. These included duplicate 1 L eDNA water sam-
ples collected before and after two electrofishing tran-
sects conducted by us on September 2, 2016 in the Wa-
bash River, IN (Experiment A1, sites WAB 1-2 in Fig. 2),
where invasive silver carp Hypophthalmichthys molitrix
was prevalent and bighead carp bighead carp H. nobilis
also was present.
The Ohio Environmental Protection Agency (OEPA)
conducted 44 electrofishing surveys at 22 sites in the
Maumee River, OH (a western Lake Erie, Great Lakes
tributary) in June—September 2012, from which all fishes
were identified, counted, and weighed (g) by them (OEPA
2014, 2015). Immediately prior to their sampling, they
collected water samples for us 10 cm below the surface in
sterile 1 L containers, of which four sites were analyzed
here (Experiment A2, sites MAU 1-4 in Fig. 2). We also
analyzed 1 L eDNA water samples collected in duplicate
by us from Lake St. Clair at the Harley Ensign Memorial
Boat Launch, Harrison Charter Township, MI on June 5,
2017 (Experiment A3, site LSC 1 in Fig. 2) and from Lake
Erie at the Franz Theodore Stone Laboratory, Put-In-Bay,
Gibraltar Island, OH on July 29, 2017 (Experiment A4,
site LEI in Fig. 2). Experiments A3 and A4 involved de
novo sequencing of eDNA water samples without accom-
panying morphological survey data, aimed to further test
the ability of these metabarcoding assays to differentiate
among fish communities from various habitats.
Experiment Series B: Population genetic variation
with metabarcoding assays
To evaluate population genetic compositions using
metabarcoding assays of eDNA water samples, 1 L
of water was collected 10 cm below the surface (site
LSC 2 in Fig. 2) and 10 cm above the benthos (LSC
3), immediately prior to seining 60 round gobies on
November 16, 2016 from the Lake St. Clair sampling
site analyzed for B1 (Figs 1, 2, Suppl. matetial 1: Table
S1), from which complete cyt b sequences were obtained
51
following Snyder and Stepien (2017). Cyt 6 sequence
haplotypes and population genetic diversity of the silver
carp previously were assessed from 37 individuals from
a Wabash River site (WAB 3 in Fig. 2) by Stepien et al.
(2019) and were further analyzed here using duplicate 1 L
eDNA water samples collected 10 cm below the surface,
as above, on September 2, 2016 (Experiment B2, Figs
1, 2). Traditional population genetic sequencing results
were compared with those from eDNA water samples
using the CarpCytb and GobyCytbd assays.
DNA capture, extraction, and library preparation
Water samples from turbid habitats often clog filters (Wil-
liams et al. 2017) and thus genetic material was captured
using centrifugation (per Stepien et al. 2019; Snyder et
al. 2020). The DNA was extracted from the pellets us-
ing Qiagen DNeasy kits (Hilden, Germany) (detailed in
Suppl. material 1). A custom library preparation (prep)
protocol (Stepien et al. 2019; Snyder et al. 2020) included
extraction controls, centrifugation controls, no-template
PCR controls (for initial amplifications and index reac-
tions), and PCR clean-up controls. Positive DNA controls
(Suppl. material 1) were used to quantify and correct for
potential errors, which might include incorrect base calls,
index-hops (see “Bioinformatic pipeline” below), and/
or cross-contamination of samples. The positive controls
contained equal mass (ng) of DNA from 10 marine fish
species that could not survive in freshwater, which had
been Sanger sequenced by us (Suppl. matetial 1: Table
S2). Unexpected sequences of positive control taxa in the
metabarcoding results (those not present in the Sanger se-
quencing results) were used to calculate an error rate (see
“Bioinformatic pipeline’ and Suppl. material 1). This
library prep protocol also utilized primers with custom
spacer inserts, with the latter serving as first step PCR in-
dices to facilitate detection and removal of cross-contam-
ination and index-hopping. The latter can occur when the
wrong index is incorporated into a HTS library, leading to
sequence-to-sample mis-assignment (Xiong et al. 2016;
MacConaill et al. 2018).
Bioinformatic pipeline
We used our previously published custom bioinformatic
pipeline (Stepien et al. 2019; Snyder et al. 2020). Prim-
ers were trimmed from raw reads using a custom Python
v3.7.1 script, which allowed for sequencing errors in 30%
of the primer NTs (a standard approach for metabarcod-
ing assays; see Deiner et al. (2017)). Since PCR errors
in positive controls tended to occur at the NT positions
immediately following the forward or reverse primers,
those first and last NTs also were trimmed (Suppl. ma-
terial 1). The trimming script additionally removed any
reads having the wrong spacer insert, which may result
from index-hopping or cross-contamination, and elimi-
nated non-informative sequences < 100 NTs from primer
dimers (Khodakov et al. 2016).
https://mbmg.pensoft.net
52 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
Trimmed reads were merged in DADA2 (Callahan
et al. 2016), which corrected sequence errors using a
de-noising algorithm and removed chimeras. We used
DADA2 default parameters, with “MaxE” set to “(3,
5)”. Inputs were truncated at median Q score < 30, for
the first 10 samples/assay/run, assessed using DADA2’s
plotQualityProfile function. De-noised sequences hav-
ing 100% similarity in DADA2 are termed amplicon se-
quence variants (ASVs).
Unique ASVs were subjected to the Basic Local Align-
ment Search Tool (BLAST; https://blast.ncbi.nlm.nih.gov/
Blast.cgi) from the command line against custom databas-
es, to obtain the top 500 results per ASV. The custom data-
base consisted of all cyt 6 or 12S RNA (MiFish) sequences
on GenBank for fishes from the Great Lakes, plus predict-
ed future invasive species (NOAA 2019). These databases
were curated using GenBank searches for “cytochrome b”
or “12S RNA” and “Actinopterygi1”, all sequences were
downloaded in FASTA format, and then those species that
were not present (or predicted future invaders) in the sys-
tems were removed (Hubbs and Lagler 2007). ASVs of
samples taken from the Wabash River were subjected to
BLAST searches against databases containing all avail-
able Actinopterygii fish cyt b or 12S RNA sequences.
Our cyt 5 reference database was robust, containing
sequences for > 95% of Great Lakes fishes and 100% of
the present and predicted invasive species (see Snyder et
al. 2020). Cyt b sequences for two native catostomid fish-
es, three extant Coregonus spp. and a believed-to-be ex-
tinct species in that genus, two cyprinids, and troutperch
were absent from GenBank at the time of our study. In
contrast, the Great Lakes 12S RNA reference sequence
database on GenBank was missing 53 (23%) of known
Great Lakes’ fish species. Sequences for 12S RNA were
absent for many native taxa, notably 80% of Coregonus
spp., 47% of catastomids, 22% of cyprinids, 21% of per-
cids, and several others. Also absent were several current
(i.e., round goby and tubenose goby) and predicted pos-
sible Great Lakes’ invasive species (1.e., steelcolor shiner
Cyprinella whipplei, stellate tadpole goby Benthophilus
stellatus, Black Sea monkey goby Neogobius fluviatilis,
and Caspian Sea monkey goby Neogobius pallasi).
BLAST results (1.e., “hits”) with < 90% query cover or
identity were removed. All unique species hits per ASV
per assay that passed this filter and had the lowest expec-
tation (e) value (best match) were combined in a list of
potential taxa. A lowest common ancestor approach was
used for taxonomic assignments for sequences that did
not match a single species (1.e., if all hits with the lowest
e value were the same genus, then taxonomic assignment
was to that level).
For metabarcoding assay results, species incidences
were scored as valid when they were greater than the cal-
culated error cutoff from the positive controls (see “DNA
capture, extraction, and library preparation” above and
Suppl. material 1) in a single assay or occurred in multi-
ple assays (hereafter termed the multi-assay approach).
This approach aimed to reduce false positives and simul-
https://mbmg.pensoft.net
taneously compensate for potential primer set biases. We
compared those results to the use of 0.1% as the cutoff
(the known rate of index-hopping on MiSeq; MacConaill
et al. 2018), and also evaluated all positive hits (with-
out frequency-based filtering, but passing the % identity
and query cover filter). Metabarcoding sequence data for
replicates from the same sampling site were combined,
by applying the bioinformatic filter for each separately
and then by combining ASVs and read counts afterwards
(independently for each primer set). Species detections
per assay and for the combined assays (multi-assay ap-
proach) were compared to results from the accompanying
morphological capture-based sampling. Scripts for this
bioinformatic pipeline and custom BLAST databases are
deposited in the Dryad database (https://doi.org/10.5061/
dryad.7mOcfxprx). FASTQ files for all samples se-
quenced are in the NCBI Sequence Read Archive (Bio-
Project Accession: PRJNA625378).
Data analyses
Results from our metabarcoding assays were compared
with their concomitant morphological identifications
from traditional capture-based surveys (1.e., electrofish-
ing transects Al and A2, or Sanger sequencing of sam-
pled fish in B1 and B2). Species appearing unique to the
metabarcoding assays and/or capture-based surveys were
evaluated from individual samples and sampling regions.
Some morphological or metabarcoding identifications
were restricted to the genus or family levels (see). False
negatives were determined using a relaxed detection cri-
terion, which recorded a species as being present when
distinguished at the genus level in either the morpholog-
ical or metabarcoding results. Species richness values
were compared between the morphological and the me-
tabarcoding approaches (and among the individual and
multi-metabarcoding assays) using f-tests in R (R Core
Team 2018), with significance adjusted using sequential
Bonferroni corrections (hereafter, SBC, Rice 1989). Bio-
mass proportions of species from morphological identi-
fications were compared statistically to the proportions
of sequence reads for single metabarcoding assays using
linear models and Spearman rank correlation coefficients
in R. Analysis of Covariance (ANCOVA) compared the
Slopes of the relationships among the different assays to
evaluate concordance (Zar 2010).
Habitat differentiation was assessed with Non-metric
Multi-Dimensional Scaling (NMDS) using Bray-Curtis
dissimilarity in VEGAN (Oksanen et al. 2019) for mor-
phological identifications and individual and multi-me-
tabarcoding assays, based on presence/absence (binary)
analysis. Differences in species compositions between the
Wabash (Al) and Maumee rivers (A2) were statistical-
ly tested using ANOVA, with the ADONIS2 function in
VEGAN. The assumption that groups of points did not sig-
nificantly differ in their distance to the centroid was tested
using BETADISPER. Habitat comparisons further were
explored using FASTCLUSTER in R. Metabarcoding
Metabarcoding and Metagenomics 4: e53455
assay and capture-based survey dendrograms were con-
structed using binary distances (based on presence/absence
of taxa) and Ward’s D2 agglomeration method (Mullner
2013). PVCLUST calculated approximate unbiased (AU)
percent bootstrap support for each node in the dendrogram
(10,000 replications), using the same distance and agglom-
eration methods (Suzuki and Shimodaira 2015).
Numbers and proportions of population haplotypes
were calculated from HTS reads and traditional Sanger
sequencing of individuals. F’,. and exact tests of popu-
lation differentiation in Arlequin (Excoffier and Lischer
2010) compared the proportions of haplotypes found with
traditional and metabarcoding methods.
Results
High-throughput sequencing metrics
No PCR amplifications occurred in the 0 hr control sam-
ples from the goby aquarium experiments (C3) or in any
of our negative extractions, centrifugations, no-template
PCR, indexing, or clean-up controls (see Suppl. mate-
rial 1). A total of 27,961,011 sequence reads were ob-
tained among all libraries (mean pe r sample per assay
+ SE=229,189 + 18,645; Suppl. matetial 1: Table S3).
A mean of 204,320 + 17,738 reads per sample per assay
was successfully trimmed. DADA2 merged an average
0.80 + 0.01 of trimmed reads, with a mean of 75.4 + 11.4
53
ASVs per sample per assay. Of those, a mean of 23 + 1.7
had BLAST hits to our fish databases that passed the iden-
tity and query cover filter of > 90% (mean query cover =
99.83 + 0.01%, mean identity = 99.14 + 0.02%). Single
species identifications included 2,342 ASVs (89% of total
2,631 hits passing the filter) and 59, 103, 43, and 75 of
the hits to the genus level for the FishCytb, CarpCytb,
GobyCytb, and 12S RNA MiFish assays, respectively.
These included 14 genera, primarily: Carassius (13% of
the overall genus level hits), Carpiodes (19%), and Ictio-
bus (39%), for which either morphological identifications
and/or another one of our metabarcoding assays resolved
a congener to species. Nine 12S RNA hits were resolv-
able only to the family level, of which six were Cyprin-
idae, and three were Catostomidae; all were discarded.
Not all eDNA samples led to successful libraries for every
assay, presumably due to primer-specific inhibition (dash-
es in Table 2). For all positive controls, the most abundant
unexpected sequence was closely related to an expected
sequence. Error frequencies calculated from the positive
controls ranged from 0.18—0.42% (mean = 0.27 + 0.02%).
Experiment Series A: Metabarcoding assays versus
morphological identifications
Laboratory experiments and analyses of the larval fish
sample showed that our assays and accompanying bio-
informatic pipeline were highly sensitive, with few false
negatives and high detection probability (Suppl. mate-
Table 2. Sample diversity based on morphological identifications versus metabarcoding results (from Experiment Series A and B).
(A) Species richness from morphological surveys and metabarcoding assays. (B) Number of species (richness) discerned with mor-
phology and species uniquely found with metabarcoding results (morphological “false negatives”). Proportion of false negatives in
metabarcoding results (in parentheses). Regional samples were combined, respectively, for the Maumee (1-4) and Wabash (1-2)
rivers. For the Maumee River, “all” indicates results for all species in summer 2012 electrofishing surveys regardless of whether
their concomitant eDNA data were processed.
A Richness
Location Morphology Multi-assay FishCytb CarpCytb GobyCytb 12S RNA MiFish
Maumee River 1 13 38 19 14 18 18
Maumee River 2 22 25 15 12 9 7
Maumee River 3 23 26 18 10 6 16
Maumee River 4 23 28 17 6 15 19
Maumee River 1-4 33 42 26 20 24 31
Lake St. Clair 1 - 16 6 7 8 8
Lake St. Clair 2 - 16 - 10 12 12
Lake St. Clair 3 - 16 9 11 9 -
Lake St. Clair all - 23 6 16 16 8
Lake Erie Islands - 14 yi, 8 - 9
Wabash River 1 13 30 8 14 - 21
Wabash River 2 12 29 14 10 5 16
Wabash River 3 - 27 17 11 9 11
Wabash River 1-2 18 37 22 20 10 30
B Richness Unique to metabarcoding assays (false negatives)
Location Morphology Multi-assay FishCytb CarpCytb GobyCytb MiFish
Maumee River 1 13 20 (0.00) 1 (0.23) 1 (0.38) 3 (0.23) 7 (0.31)
Maumee River 2 22 6 (0.41) 3 (0.45) 3 (0.73) 1 (0.73) 1 (0.77)
Maumee River 3 23 4 (0.26) 0 (0.57) 1 (0.70) 0 (0.78) 3 (0.52)
Maumee River 4 23 10 (0.30) 1 (0.61) 1 (0.83) 4 (0.65) 6 (0.57)
Maumee River 1-4 33 18 (0.12) 2 (0.36) 4 (0.67) 6 (0.58) 9 (0.39)
Maumee River all 59 9 (0.27) 0 (0.56) 2 (0.76) 3 (0.64) 6 (0.54)
Wabash River 1 13 16 (0.23) 0 (0.54) 5 (0.54) - 13 (0.46)
Wabash River 2 12 14 (0.08) 7 (0.67) 1 (0.50) 2 (0.83) 7 (0.17)
Wabash River 1-2 18 21 (0.00) 9 (0.33) 5 (0.39) 2 (0.83) 14 (0.22)
https://mbmg.pensoft.net
54 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
rial 1). Several false negatives, 1.e., taxa discerned by
morphological identifications but not with single me-
tabarcoding assays using the calculated error cutoff, were
positive when 0.1% was used for filtering (V = 13) or
when all ASVs were accepted (NV = 48). Our multi-assay
approach detected more species than did the single as-
says (see “Community comparisons” below). Using the
multi-assay approach, just one false negative (with the
calculated error cutoff) was positive when all ASVs were
accepted. When ASVs above 0.1% were accepted, sever-
al index-hops were apparent, including for the Black Sea
sprat Clupeonella cultriventris, a possible future invader
of the Great Lakes that has not been documented in North
America (NOAA 2019), and for silver carp outside of its
known established invasive range in the Mississippi Riv-
er basin (Kolar et al. 2005). Those likely mis-assigned
(index-hops) from mock communities on the same run or
resulted from cross-contamination, since both had been
previously Sanger sequenced in our lab. Hits for cod Ga-
dus spp. and rockfish Sebastes spp., marine taxa that were
used as positive controls here, also appeared in some me-
tabarcoding results from eDNA samples, when 0.1% was
used as the frequency-based filtering cutoff. Thus, we
used the calculated error cutoff due to the possibility of
false positives under less stringent filtering conditions.
Morphological identifications from capture-based sur-
veys did not completely overlap the metabarcoding assay
results. For the total of all samples, our multi-assay me-
tabarcoding approach detected more taxa than did mor-
phological determinations (Table 2A). Results revealed
that 51 taxa (74%) were in common between both ap-
proaches 1n the Wabash (A1) and the Maumee (A2) rivers
(Suppl. material 1: Table S4). Since hybrid species iden-
tified morphologically (1.e., hybrid striped bass Morone
chrysops X saxatilis) possess a single mtDNA genome,
our metabarcoding assays discerned the maternal species
alone. Metabarcoding assays identified some samples to
species that were morphologically unresolved. Several
metabarcoding false negatives were restricted to the ge-
nus level, either due to lack of taxonomic resolution or in-
complete BLAST databases. With those corrections, 93%
of the species detected in capture-based surveys (taken
concomitant with eDNA water sampling) also were iden-
tified by the metabarcoding assays. There was no rela-
tionship between degree of primer sequence match and
false negatives (Suppl. material 1: Fig. S1). A positive
(but weak) correlation was observed between biomass of
species and sequence reads (Suppl. material 1: Table S5,
Fig. S2). This also was not significantly related to the de-
gree of primer sequence match.
Experiment Al: Morphology discerned 18 fish species
belonging to five families, from electrofishing surveys in
the Wabash River. Our metabarcoding assays identified
all of those (100%) to species, along with an additional
20 species (Fig. 3, Table 2B, Suppl. material 1: Table S4).
Experiment A2: 33 species belonging to 11 families
were detected with electrofishing surveys conducted con-
comitant with eDNA water sampling from four sites in
https://mbmg.pensoft.net
the Maumee River. Our metabarcoding analyses detected
29 (88%) of those, along with an additional 19 species. A
total of 59 species belonging to 12 families were collect-
ed among all 44 morphological surveys from 22 Maumee
River sites across four months of intensive sampling by
the OEPA during summer 2012. Our metabarcoding as-
says discerned 43 (73%) of those species and an addi-
tional nine species from just single 1 L water samples at
only four of the sites (corresponding to 9% of the OEPA’s
surveys, and 18% of their total number of sampling sites).
Only four of the fish species from the Maumee Riv-
er electrofishing surveys conducted concomitant with
our four single eDNA water samples were not detected
with metabarcoding assays — northern hogsucker Hy-
pentelium nigricans, \ongnose gar Lepisosteus osseus
(the sole false negative in our high diversity aquarium
experiments; Suppl. material 1: Experiment C2), stone-
cat Noturus flavus, and white crappie Pomoxis annularis
(Fig. 3, Table 2B, Suppl. material 1: Table S4). Thirteen
additional species uncovered in the Maumee River across
the OEPA’s entire four month-long morphological elec-
trofishing survey did not occur in metabarcoding results
from the four water samples (Table 2B), including rock
bass Ambloplites rupestris, golden redhorse Moxostoma
erythrurum, five darter species, four small cyprinids, and
two madtom (Noturus) species.
As expected, species results from the single assays did
not completely overlap. Of the 347 individual species de-
tections across all samples, 111 (32%) occurred in single
assays. Twenty-one (6%) of the detections were scored as
positive according to the multi-assay criteria, meaning that
their hits fell below the cutoff values for multiple assays
in the same sample. Mean proportions of false negatives
from single assays in samples taken concomitant with
electrofishing surveys were 0.48 + 0.04. When all samples
from a single region were combined, this value fell to 0.34
+ 0.04. The multi-assay approach had significantly fewer
false negatives after SBC than did the single assays (p <
0.004 for all). Mean proportions of false negatives using
the multi-assay approach were 0.17 + 0.05 for the indi-
vidual sampling sites, and 0.09 + 0.03 when all samples
from each region were combined. Six of the common false
negatives from the 12S RNA assay were due to species
lacking reference 12S RNA sequences in GenBank (L.e.,
quillback Carpiodes cyprinus, highfin carpsucker Ca.
velifer, shorthead redhorse Moxostoma macrolepidotum,
ghost shiner Notropis buchanani, and white crappie).
Complete (100%) detection from metabarcoding assays
occurred for Experiment A2 in the MAU 1 sample, which
had the lowest morphological species richness (Table 2B).
Comparing the four eDNA water sample metabarcoding
results to the entire scope of the electrofishing surveys con-
ducted by the OEPA in the Maumee River throughout sum-
mer 2012, each single assay detected between 26% (Carp-
Cytb) to 48% of the overall species (12S RNA MiFish).
The multi-assay approach discerned 73% of the species
overall in this watershed level community, based on just
four | L water samples (one each from four different sites).
Metabarcoding and Metagenomics 4: e53455
@ Morph
100% Catostomidae (4)
Cyprinidae (5)
90% Gobiidae (2)
Centrarchidae (2)
: Moronidae (1)
80% Percidae (2)
Cottidae (1)
70% Ictaluridae (1)
77)
5 60% Atherinopsidae (1)
5 Clupeidae (1)
“+ Fundulidae (1)
w» 50% Catostomidae (4)
MO Cyprinidae (9)
+ 40% Centrarchidae (7)
7 Moronidae (2)
O 30% Percidae (1)
oD) sd Scianidae (1)
oO Ictaluridae (2)
20%
Catostomidae (1)
10% Lepisisteidae (1)
Centrarchidae (1)
0% Ictaluridae (1)
0
Maumee R., 1-4
@ Morph & metabarcode
Maumee R. all
55
@ Metabarcode
Catostomidae (2)
Cyprinidae (2)
Gobiidae (1)
Centrarchidae (1)
Moronidae (1)
Amiidae (1)
Catostomidae (4)
Cyprinidae (9)
Gobiidae (1)
Percidae (1) Hyodontidae (2)
Cottidae (1) Lepisosteidae (1)
Atherinopsidae (1) Percidae (1)
Clupeidae (1) Ictaluridae (2)
Fundulidae (1)
Catostomidae (7)
Cyprinidae (13)
Gobiidae (1)
Centrarchidae (10)
Moronidae (2)
Percidae (3)
Scianidae (1)
Ictaluridae (3)
Catostomidae (2)
Cyprinidae (3)
Lepisisteidae (1)
Centrarchidae (2)
Percidae (5)
Ictaluridae (3)
Clupeidae (2)
Catostomidae (7)
Cyprinidae (4)
Centrarchidae (4)
Scianidae (1)
Wabash R. 1-2
Figure 3. Families (and numbers of fish species, in parentheses) detected with morphology (Morph), metabarcoding assays, or using
both methods (for Experiment Series A). Samples taken concomitant with electrofishing surveys were combined (Maumee River
1-4, Wabash River 1-2). Maumee River all: comparison of four eDNA water samples to 44 electrofishing transects from 22 sites in
the Maumee River, June—September 2012.
A mean of 4.6 + 1.0 taxa in the single assays or
14.5 + 5.3 using the multi-assays were undetected in the
concomitant morphological samples. Two unlikely false
positives occurred with the 12S RNA MiFish assay. There
were several apparent matches to the non-native black-
tip jumprock Moxostoma cervinum in the Wabash River,
likely due to most Moxostoma being absent from the 12S
RNA database — six of the seven species known from the
watershed (Simon 2006) lacked reference sequences. In-
stead, other Moxostoma spp. were identified by the cyt
b assays and/or morphology in every sample for which
the 12S RNA assay displayed a hit for blacktip jumprock.
The false hits from the 12S RNA assay were discarded
from the final dataset. The other apparent false positive
from the 12S RNA assay was marine white croaker Ge-
nyonemus lineatus (native to the Eastern Pacific and not
known to be able to survive in fresh water), which was
not present in positive controls. Just one species in that
family is known to occupy the sampled regions — the
freshwater drum Aplodinotus grunniens. The drum had a
single 12S RNA reference sequence in the BLAST data-
base, which multiple 12S RNA ASVs from other samples
matched. Without a reasonable explanation, hits for that
species also were removed from the final dataset.
The metabarcoding assays found every invasive spe-
cies collected in the morphological capture-based sur-
veys, including: silver carp, common carp Cyprinus
carpio, flathead catfish Pylodictis olivaris, round goby,
and white perch (Suppl. material 1: Table S4). Ghost
shiner eDNA was not detected from two Maumee Riv-
er sites (A2) where it was physically collected, but was
found in metabarcoding assay results from another sam-
ple in the region. Both of those false negatives occurred
at < 0.1% of the total fish biomass. Our assays identified
more samples that contained invasive species. For exam-
ple, just one electrofishing transect in the Wabash River
(Al) caught silver carp, yet every metabarcode sample
detected that species in at least one assay. Our assays de-
tected invasive grass carp in the Maumee (A2) and Wa-
bash rivers (A1), where it is known to occur but was not
caught. Tubenose goby was not captured in our Maumee
River samples (A2), but was present in the eDNA me-
tabarcoding results (and is known to occur there).
Experiment Series A: Community comparisons
Some species were detected in just one of the geographic
regions we sampled. In the Wabash River (Experiment
Al), these were: blue sucker Cycleptus elongatus and
invasive silver carp Hypophthalmichthys molitrix, identi-
fied both with morphology and metabarcoding assays, and
gravel chub Erimystax x-punctatus and mooneye Hiodon
tergisus by eDNA metabarcoding assays alone. Spe-
cies detected in the Maumee River samples alone were:
pumpkinseed Lepomis gibbosus, orangespotted sunfish
Lepomis humilis, invasive ghost shiner, spotted sucker
https://mbmg.pensoft.net
56 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
Minytrema melanops, and common logperch Percina
caprodes with both morphology and metabarcoding as-
says, and black crappie Pomoxis nigromaculatus, spoon-
head sculpin Cottus ricei, orangethroat darter Etheostoma
spectabile, and invasive tubenose goby solely with me-
tabarcoding assays (Experiment A2). We surveyed Lakes
St. Clair (Experiment A3) and Erie (Experiment A4) with
eDNA metabarcoding assays alone. Black bullhead Am-
eiurus melas solely was in the Lake Erie Islands sample,
and invasive chum salmon Oncorhynchus keta in Lake
St. Clair (where it has been introduced for sport fishing).
Species richness values obtained from single samples,
as well as for regional analyses, were higher using the
multi-assay approach than with any single metabarcod-
ing assay or morphology (Table 2A). Richness values
from morphology and metabarcoding assays were not
significantly correlated. Richness values for the Wabash
(Al) and Maumee (A2) rivers were statistically signifi-
cantly greater with the multi-assay approach than for the
other methods, including single assays and morphology
(p < 0.004 for all). Notably, numbers of taxa detected
with the multi-assay approach that were undetected by
morphological capture-based sampling, were greater
than false negatives in all but two samples (MAU2-3;
Table 2A). Numbers of replicates and/or samples collect-
ed per region were significant predictors of species rich-
ness for single metabarcoding (R?= 0.73, p < 0.001) and
multi-assay results (R?= 0.79, p < 0.001).
NMDS plots discerned more discrete groupings of
regional samples with multi-assays than with single as-
says (Fig. 4). Significant differences in distances to the
centroid were not found for the Wabash (A1) and Mau-
mee (A2) rivers with any method (morphology, indi-
vidual assay, or multi-assays). Multi-assays (df = 1,
F' = 6.32, p = 0.030) as well as the separate FishCytb (df
= 1, F = 3.00, p = 0.031), CarpCytb (df = 1, F = 4.73,
p = 9.030), and 12S RNA MiFish (df = 1, F = 4.18, p=
0.028) assays resolved significance differences between
the Wabash and Maumee river fish communities (Al and
A3), whereas the GobyCytb assay and morphological sur-
veys did not.
Some samples did not cluster by geographic region
in the dendrograms when single assay results were used
together with morphological identifications (Fig. 5A).
When the multi-assay approach and morphological data
were used, all samples clustered according to region,
showing improved resolution and site-specific discrimi-
nation (Fig. 5B). Lentic and lotic sites were well-differ-
entiated. Fish communities from the two lake sites (Lakes
Erie and St. Clair) were more similar to each other than
either was to the river samples, in the multi-assays as well
as in most single assays.
Experiment Series B: Intra-specific population diver-
sity from metabarcoding assays
Aquarium experiments using round gobies possessing
haplotypes RG 1, 8, and 57 showed no false negatives,
https://mbmg.pensoft.net
Wabash River + Lake Erie Is. 4
Maumee River O Detroit R. larvae A
Lake St. Clair
Morphology
1 -1
Axis 1
Figure 4. Non-metric multi-dimensional scaling plot based on
binary Bray-Curtis dissimilarity for presence/absence of spe-
cies detected by metabarcoding assays and morphological cap-
ture-based methods (when both were conducted concomitantly;
Experiment Series A).
but some false positives fell above the error cutoff (Suppl.
material 1: Experiment C3). Traditional Sanger sequenc-
ing of tissue samples identified three round goby haplo-
types in Lake St. Clair (Experiment B1): “RG 1” (78%
of individuals), “8” (12%), and “57” (10%) (Fig. 6). All
three haplotypes were found in the benthic eDNA water
sample processed with the GobyCytb assay. The two rare
haplotypes were absent from surface water eDNA (even
upon accepting all ASVs of any frequency or searching
raw unmerged sequences). Both surface and benthic
water samples contained multiple haplotypes that were
above the calculated error cutoff, which were undetected
by the Sanger sequencing analysis and were not in Gen-
Bank.
Sanger sequencing discerned three silver carp haplo-
types that were physically sampled in the Wabash River
(Experiment B2: designated as “SC A”, “B”, and “H”),
constituting 49%, 48%, and 3% of that population sam-
pled at a separate time (Stepien et al. 2019). The CarpCytb
assay differentiated among all three of these haplotypes in
the eDNA water samples, whose read proportions were
67%, 30%, and <0.5%, respectively, in a different year
(Fig. 6). Three additional previously undiscovered haplo-
Metabarcoding and Metagenomics 4: e53455 57
A 97 Morph MAU 2
75 Morph MAU 3
76 Morph MAU 4
71 Carp MAU 3
Fish MAU 3
78 Fish MAU 2
85 | Goby MAU 4
| 69 MiFish MAU 3
88 | 76 Fish MAU 4
MiFish MAU 4
Carp MAU 1
73 MiFish MAU 1
Morph MAU 1
85 7 80r — Fish MAU 1
Goby MAU 1
76 Goby MAU 3
95 Goby MAU 2
MiFish MAU 2
839 MiFish WAB 3
MiFish WAB 1
97 68' MiFish WAB 2
83 Fish WAB 1
Morph WAB 1
g7 37 Carp WAB 2
Morph WAB 2
92 ——_ —--—_.___ — Carp WAB 3
89 Goby WAB 3
79 — Fish WAB 2
89 Carp WAB 1
—— Fish WAB 3
69] MiFish DRL
as 100, Fish DRL
91 Goby DRL
69 —— — Carp DRL
Morph DRL
Goby WAB 2
-— - MiFish LEI
rae 80 Goby LSC 2
== 78 - Fish LSC 1
Fish LEI
52 Carp LSC 3
84 Carp LEI
73 MiFish LSC 1
74 Carp LSC 2
Goby LSC 1
72 Carp MAU 2
56 Carp MAU 4
63 Carp LSC 1
Goby LSC 3
83
90 76
89
96
B 76 MultiMAU 1
Morph MAU 1
ea 96 MultiMAU 2
76 MultiMAU 4
85 Morph MAU 2
78 Morph MAU 3
Morph MAU 4
94 ——————_ Morph WAB 1
93 —_———<$$———$_—_———-_ Morph WAB 2
—_—_—_— — Multi WAB 1
io —— Multi WAB 3
— MultiWAB 2
94 Multi DRL
84 Morph DRL
99 Multi LE|
MultiLSc 3
0.5 611 98- MultiLSc 1
* MultiLSC 2
79
Figure 5. Dendrogram of relationships among metabarcoding and morphological samples, using binary distances and Ward’s D2 ag-
glomeration method (Experiment Series A). (A) Results from individual metabarcoding assays and morphological data. Fish = Fish-
Cytb, Carp = CarpCytb, Goby = GobyCytb, MiFish = 12S RNA, Morph= morphological sampling. (B) Results from the multi-assay
approach (Multi) and morphological (Morph) data. See Fig. 2 for color key and site abbreviations.
https://mbmg.pensoft.net
58 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
A 100%
90%
80%
70%
60%
50%
40%
30%
Percent haplotype
20%
10%
0%
LSC 2
(surface)
LSC3
(benthos)
GobyCytb
MRG1 mSCA
mRG8 MSCB
DRG 57 mSCH
MRGN1 mSCN1
wm RGN2 WSC N2
mRGN3 SC N3
BRGN4
MRGNS
Percent haplotype
WAB 3
CarpCytb
Figure 6. Population genetic haplotypic diversity assessed with metabarcoding assays versus traditional DNA sequencing (Exper-
iment Series B). Round goby (RG) in Lake St. Clair (LSC2: surface, LSC3: benthos) and silver carp (SC) haplotypes in the Wa-
bash River (WAB) assessed with traditional population genetic sequencing (Trad) and the GobyCytb and CarpCytb metabarcoding
assays. New cytochrome b haplotypes (N) not described from either species to date, and having sequence frequencies <1% are
unlabeled, for visual clarity.
types were detected — all above the calculated error cut-
off and at greater frequency than the rare haplotype “H”.
Comparisons of our metabarcoding assays with tradition-
al population genetics based on Sanger sequencing of
mtDNA haplotypes showed significant frequency differ-
ences after SBC using F’,, (mean F,,.= 0.179, p < 0.0002
for all) and exact tests (p < 0.0001 for all).
Discussion
Our multi-assay metabarcoding approach and accompa-
nying bioinformatic pipeline demonstrated high detec-
tion probability that was better or similar to traditional
morphological sampling, with low false negatives and
additional species discerned despite considerably less
sampling effort. The custom pipeline improved overall
sequence run quality and removed apparent index-hops
and/or cross-contamination by using spacer inserts,
which served as indices for the initial amplifications.
Sequencing error was calculated using positive controls
of marine species that could not live in this freshwater
environment. ASVs whose proportions were below the
error cutoff were removed unless they occurred in multi-
ple markers. Proportions of sequence reads showed weak
but positive correlations to species biomass. Metabarcod-
ing results for the round goby and the silver carp assays
identified all of the haplotypic variation found with tradi-
https://mbmg.pensoft.net
tional population genetics Sanger sequencing. Additional
“new” haplotypes found with metabarcoding assays may
have resulted from sequencing error not removed by our
pipeline, meriting further testing.
Experiment Series A and B: Reducing error from me-
tabarcoding assays
We evaluated various frequency-based filters to eliminate
index-hops and/or cross-contamination from positive con-
trols or other samples (since the likelihood that sequenc-
ing error would result ina BLAST hit to a different species
was low). Given the large number of samples that can be
pooled on a HTS run, such sources of error could result
in false positives (Xiong et al. 2016; MacConaill et al.
2018). In our investigation, sequencing run quality was
improved, and risks of cross-contamination or index-hops
were reduced using the custom spacer insert library prepa-
ration protocol, which served to index the first step PCR.
Our custom trimming script removed the majority of these
errors. Despite the fact that the most common error in ev-
ery positive control was closely related to an expected se-
quence (implicating sequencing error as the origin of these
unexpected ASVs), our calculated error cutoff was the
sole method that eliminated all apparent index-hops. Sev-
eral studies have used positive controls to apply frequen-
cy-based bioinformatic filtering (Hanfling et al. 2016; Port
et al. 2016; see Deiner et al. 2017). That approach likely
Metabarcoding and Metagenomics 4: e53455
led to false negatives in our single assays. However, when
the two targeted and the two general primer sets were
combined in our multi-assay approach, false negatives and
false positives were greatly reduced.
Experiment Series A: Relative abundances of species
from metabarcoding assays
Our results showed that the cyt b assays revealed strong
positive relationships between input concentrations of
DNA and the proportions of sequence reads in mock
communities (Suppl. material 1: Experiment C1). eDNA
water samples showed weaker, but most often positive
relationships, depending on the assay used. These rela-
tionships likely were affected by environmental condi-
tions, such as eDNA transport and settling rates in water
(Deiner and Altermatt 2014, Pont et al. 2018), which vary
under physical and biological conditions (Barnes et al.
2014; Jo et al. 2019) and whether the eDNA was intra-
or extra-cellular/organellar (Turner et al. 2014). Variable
abundances of mitochondria in different types of cells
shed at different rates by different species also affect
these relationships (Robin and Wong 1988; Klymus et al.
2017; Jo et al. 2019). Some authors have theorized that
corrections for specific taxa, their sizes, life stages, and
environmental conditions could be applied (van Bochove
et al. 2020). Subsampling during collection and library
preparation, as well as primer biases, also could have af-
fected our results (Deiner et al. 2017).
Hanfling et al. (2016) discerned that metabarcoding as-
say read abundances were positively correlated with find-
ings of long-term concomitant capture-based surveys of
fishes in three United Kingdom lakes. Van Bletjswik et al.
(2020) found that fish biomass caught in fyke net surveys
in a tidal inlet between the North and Wadden Seas showed
weak positive correlation with metabarcoding sequence
read abundances for just the eight most abundant species.
Positive correlations with sequence reads at higher taxo-
nomic levels have appeared more common in most studies
(Thomsen et al. 2016; Gillet et al. 2018), although since
different taxa in families often are not ecological equiva-
lents (e.g., benthic invertivore darters and several piscivo-
rous species in Percidae), those results are less useful than
comparisons at the species level. Results here indicated
that the selected assay influenced these relationships, as
has been corroborated by other studies (Gillet et al. 2018).
Experiment Series A: Resolution from metabarcoding
assays versus conventional sampling
Elucidating overlap in community diversity between me-
tabarcoding assays and traditional morphological cap-
ture-based survey methods was an important goal of our
Experiment Series A. Other investigations have shown
wide variation in species overlap between these approaches,
ranging from 25% (Gillet et al. 2018; Cilleros et al. 2019) to
> 90% (e.g., Hanfling et al. 2016, Port et al. 2016; Stoeckle
59
et al. 2017). Few studies have identified 100% of the cap-
ture-sampled species with metabarcoding assays from a
single site or watershed, as discerned here for the Wabash
River (Al) and for one of our Maumee River sites (A2).
To our knowledge, most other investigations that achieved
high resolution likewise employed multiple primer sets
and/or were from low-diversity environments (Civade et al.
2016; Shaw et al. 2016; Evans et al. 2017; Fujii et al. 2019).
The trend for higher detection efficiency across broad-
er geographic regions or watersheds compared to findings
at individual sampling sites is common (Cilleros et al.
2019; Fujii et al. 2019; Lawson Handley et al. 2019), in-
dicating that overall sampling effort, numbers, volumes,
and/or spatial extents with eDNA often are not equivalent
to the coverage of many morphological surveys. Shaw et
al. (2016) achieved 100% detection regionally for fresh-
water fish species but had lower success on a per site
basis. High identification efficiency (few false negatives
and larger number of species uniquely distinguished) in
the present results likely was achieved due to our use of
multiple metabarcoding assays, since our sampling was
limited. The 100% detection efficiency in the Wabash
River may have been aided by our collection of duplicate
samples; one replicate at each site was collected before
and another after the electrofishing transect, which mixed
the water and incorporated eDNA from other species.
Sequence identifications have been shown to be relat-
ed to stringency of bioinformatic filtering (Alberdi et al.
2017; Evans et al. 2017), which were tested here by com-
paring different error cutoffs for accepting ASVs as “true
positives”. Even when all sequences were accepted re-
gardless of frequency, our metabarcoding assays did not
identify all species present in the summer-long Maumee
River conventional capture-based surveys conducted by
the OEPA (A2). Since we analyzed just single 1 L eDNA
water samples taken at only four (9% of electrofishing
surveys) of those sites, our study demonstrated very high
efficiency despite a very minimal sampling effort. Evans
et al. (2017) used varying stringencies of bioinformatic
filtering based on numbers of samples and/or assays for
which species were detected, comparing metabarcoding
results to morphological identifications. Their low and
moderate stringency bioinformatic methods found all 10
species that were present in capture-based surveys of a
small Michigan lake, with three false negatives with their
highest stringency criteria. Those authors used rarefaction
to show that > eight samples were needed to be processed
with all three assays in order for the species accumulation
curve to reach an asymptote. Due to sampling constraints,
a similar analysis could not be applied here.
Unsurprisingly, more intensive sampling regimes in-
creased the total diversity obtained and improved overlap
between metabarcoding assays and capture-based sur-
veys (Evans et al. 2017; Bylemans et al. 2018; Cantera et
al. 2019). The time and effort needed to obtain commu-
nity composition data with the metabarcoding approach
here was much less than with traditional methods. On-
https://mbmg.pensoft.net
60 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
site filtering of larger quantities of water also can improve
overlap between metabarcoding and morphological iden-
tification results (Civade et al. 2016). Some species may
be difficult to perceive with eDNA due to their behavior
(Barnes and Turner 2016). For example, longnose gar, a
lie-in-wait ambush predator, was undetected by our me-
tabarcoding assays in the Maumee River and also in the
high diversity aquarium experiments; gar may not shed
much eDNA while stationary in the water. As indicated
here, the joint use of several targeted and general me-
tabarcoding assays can increase detection when eDNA
sampling is limited. Technical replicates (replicate am-
plifications of the same extractions) also may increase
accuracy, however, their effectiveness have been shown
to be less than biological replicates (Alberdi et al. 2017).
Traditional capture or visual surveys can be thwart-
ed by physical or environmental conditions (Fujii et al.
2019). The lowest morphological-based species richness
value obtained was near the mouth of the Maumee River
(Experiment A2: Maumee River | at river mile 9.4), in
one of the deepest locations sampled, having high sus-
pended solids that decreased visibility (OQEPA 2014), un-
der which conditions electrofishing is less effective. In its
concomitant eDNA water sample, our assays identified
100% of those species and an additional 20. Fujii et al.
(2019) applied metabarcoding assays to backwater lakes
in Japan, achieving 100% detection efficiency where cap-
ture-based methods were deemed difficult and sampled
diversity was low. Different nets and sampling gear pos-
sess different biases, selectively capturing some species
while leaving others unsampled, with capture avoidances
varying with habitat conditions and among species.
The 100% detection efficiency of our metabarcoding
assays at MAU 1 may have been due to eDNA samples re-
solving a larger spatial extent than capture-based methods,
particularly in large lotic systems (Cilleros et al. 2019;
Fremier et al. 2019). Just three of its 13 species in the mor-
phological sampling surveys were not discerned upstream.
Transport of eDNA in large rivers, such as the Maumee
River, has been recorded up to 130 km (Pont et al. 2018),
which may explain our higher resolution at that site.
Invasive species discovered solely with our metabar-
coding assays all were within their known geographic
ranges, except for round goby in the Wabash River (A1).
The round goby also was identified from eDNA in nearby
bait shops (Snyder et al. 2020) and may have expanded its
range into the region. Its Wabash River detection occurred
in a single assay and sample; thus, there is a possibility that
it was a false positive not removed by our pipeline. Other
studies also have identified invasive species with metabar-
coding assays that were absent from morphological sur-
veys (Zaiko et al. 2018; Fujii et al. 2019).
Experiment Series A: Community diversity patterns
with metabarcoding assays
eDNA metabarcoding assays often have described great-
er diversity in habitats compared to morphological iden-
https://mbmg.pensoft.net
tifications from capture-based sampling (see Deiner et
al. 2017), as determined here (Experiment Series A). In
some cases, species occurring solely in metabarcoding
results may result from an incomplete reference database,
which can lead to sequence hits for closely related taxa in-
stead of those actually present (Cantera et al. 2019). Our
use of a robust cyt b database and removal of a few im-
probable hits for the 12S RNA results circumvented that.
Some of the species uniquely found with our metabar-
coding assays in the Maumee (A2) or Wabash (A1) riv-
ers either had small body sizes (golden shiner, mooneye)
or were benthic (bowfin Amia calva, spoonhead sculpin,
blue catfish /ctalurus furcatus, and round and freshwater
tubenose gobies), rendering them less susceptible to elec-
trofishing capture. Several studies likewise have shown
that metabarcoding detection of species that often evade
traditional capture (Hanfling et al. 2016; Port et al. 2016;
Bessey et al. 2020).
Our multi-assay approach and most of the single
metabarcoding assays differentiated taxonomic compo-
sitions and diversity levels among geographic regions
more effectively than did morphological captured-based
surveys. This finding is in concert with results from other
metabarcoding investigations (Cilleros et al. 2019). The
present multi-assay approach also correctly distinguished
lentic from lotic habitats, likely due to the greater diver-
sity revealed when results from multiple markers were
combined. West et al. (2020) found significant variation
in community composition among locations in an exten-
sive metabarcoding survey of a tropical coral reef in the
Cocos (Keeling) Islands, Australia, detecting 46 species
that previously were undocumented from those sites.
Experiment Series B: Population genetic patterns
from metabarcoding results
Our goby aquarium experiments (Suppl. material 1: Ex-
periment C3) and additional studies have demonstrated
the potential of metabarcoding assays to assess popula-
tion-level diversity (Marshall and Stepien 2019). How-
ever, given the false haplotypes present in our goby
aquarium experiments, there is a possibility that new
diversity in our metabarcoding results (absent from tra-
ditional sampling) stemmed from sequence error. Field
analyses conducted here suggest that traditional data col-
lection methods may yield different population genetic
results than found from metabarcoding assays of eDNA
water samples (Experiments B1—2). Although our target-
ed assay and sampling from the appropriate location in
the water column detected all of the haplotypic diversity
discerned with traditional single-individual sequencing,
the proportions of those haplotypes differed. Factors that
can affect species proportions assessed with metabarcod-
ing assays versus physical sampling likewise influenced
these population genetic results.
Aquarium eDNA experiments by Tsuji et al. (2018)
detected mtDNA control region haplotypes of the ayu
Plecoglossus altivelis, but despite DADA2’s denoising
Metabarcoding and Metagenomics 4: e53455
algorithm (albeit without any frequency based bioin-
formatic filtering), they discovered 31 false haplotypes,
with seven occurring across all 15 replicates. This likely
was due to error being non-random on the HTS platform,
posing challenges for gathering population genetics data
(Nakamura et al. 2011; Schirmer et al. 2016). We also
may have some false haplotypes in our eDNA field study
findings, meriting further analyses.
To our knowledge, few investigations have exam-
ined whether haplotype identities and their frequencies
from traditional Sanger sequencing of tissue samples
matched those found with metabarcoding assays in the
environment. Parsons et al. (2018) compared 88 harbor
porpoise Phocoena phocoena tissue extractions that had
been Sanger sequenced for the mtDNA control region,
using a metabarcoding assay of 36 eDNA water samples
collected in the fluke prints of diving aggregations. Five
of the 28 known haplotypes also were recovered in the
metabarcoding results, along with three additional haplo-
types (two of which were previously unknown and might
constitute false haplotypes). A whale shark Rhincodon
typus aggregation was sampled offshore in the Arabian
Gulf with biopsy spears and sequenced for the complete
mtDNA control region by Sigsgaard et al. (2017). Water
from eight sites where whale sharks were observed were
sampled in triplicate, and the extractions pooled and se-
quenced using two metabarcoding assays targeting por-
tions of the same gene. One of the assays yielded very
similar haplotype frequencies to those determined from
tissue sampling, and the other did not, with complete re-
covery of all haplotypes attributable to the large number
of eDNA water samples (N = 24) collected.
Haplotypic diversities of populations targeted for me-
tabarcoding assays must be evaluated in order to design
primers that are best able to differentiate them. In theory,
metabarcoding assays have the ability to distinguish more
variation and/or more accurately assess population genet-
ic diversity, due to the much larger numbers of individu-
als that may be screened, as for thousands of dreissenid
mussel larvae by Marshall and Stepien (2019). Accura-
cy of results is influenced by the assay’s design and the
targeted taxa, specifically the gene region selected and
the match of the primers. Metabarcoding applications
for population genetic studies hold promise, but need
ground-truthing, careful design, and verification.
Conclusion
Our metabarcode pipeline revealed high species-level
discrimination and low false positive probability employ-
ing the multi-assay approach. This was achieved by using
multiple primer sets to alleviate false negatives stemming
from possible bias, and applying stringent bioinformat-
ic filtering to reduce any false positives from cross-con-
tamination, index-hopping, or sequencing error. Results
from these primer sets were combined in a logical way.
The multi-assay approach discerned nearly all of the di-
61
versity sampled over much more extensive traditional
electrofishing surveys, and yielded an appreciable num-
ber of additional species. We found that our multi- and
single assays alike better differentiated among ecological
regions and their communities than did morphological
identifications from conventional sampling.
Future work likewise should employ a library prepara-
tion and bioinformatic pipeline that reduces error using 1n-
dexing of the initial PCR, and removal of cross-contamina-
tion and index-hopping. Such research also should assess
effectiveness using positive controls and/or mock commu-
nities, for every assay on every run. Multiple values for
frequency-based filtering should be evaluated with these
positive controls, mock communities, and test samples
to determine which performs best. More intenstve eDNA
water sampling at each location likely would improve the
performance of the multi-assay metabarcoding results
presented here. Metabarcoding reads potentially could be
used as a proxy for proportional taxon abundances within
the system, but results are marker dependent and should
be interpreted with caution. Current technological limita-
tions may render population genetic analyses using me-
tabarcoding data problematic, but as technology improves,
error incidences decline, and longer sequence read lengths
become more feasible, this application shows promise.
Data resources
Scripts for the bioinformatic pipeline and custom
BLAST databases are in the Dryad database (https://doi.
org/10.5061/dryad.7mOcfxprx). FASTQ files for all sam-
ples sequenced are in the NCBI Sequence Read Archive
(BioProject Accession: PRJNA625378). The Suppl. ma-
terial 1 contains additional details, including additional
experimental results.
Acknowledgements
This is contribution 4967 from the NOAA Pacific Marine
Environmental Laboratory (PMEL) and 2020-1061 from
the University of Washington’s Joint Institute for the
Study of the Atmosphere and Ocean (JISAO). We thank
Nathaniel Marshall for aiding in library preparation and
pipeline development; Thomas Blomquist, Carson Prich-
ard, and James Willey for early consultation and work on
primer development; Edward Roseman for collecting and
identifying the species in the larval fish sample; Yuriy
Kvach, Matthew Neilson, and Mariusz Sapota for col-
lecting unestablished potential invaders for use in mock
communities; Mark Pyron for conducting electrofishing
surveys in the Wabash River; David Altfater and the en-
tire Ohio Environmental Protection Agency stream sam-
pling crew for conducting the electrofishing surveys and
collecting water samples in the Maumee River; and Keith
Wernert for obtaining permission for us to sample display
aquarium 2 and providing the census of fishes within. The
https://mbmg.pensoft.net
62 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
laboratory of C.A.S. provided logistical support, espe-
cially Shane Yerga-Woolwine and Anna Elz. Additional
support for grant and project management was provided
by Thomas Ackerman, Frederick Averick, David But-
terfield, Frank Calzonetti, Kevin Czajkowski, Timothy
Fisher, Darlene Funches, Anna Izzi, Daryl Moorhead, and
Scott McBride. Jonathan Bossenbroek, Kerry Naish, and
William Von Sigler provided valuable comments on an
early version of the manuscript.
The research was funded by USEPA grants GL-
00E01149-0 (to C.A.S. and W. Von Sigler) and GL-
O0E01898 (to C.A.S. and Kevin Czajkowsk1); the latter
was partially subawarded from the University of Toledo to
the University of Washington Joint Institute for the Study
of the Atmosphere and Ocean (JISAO) for research at the
new Genetics and Genomics Group of C.A.S., NOAA
Pacific Marine Environmental Laboratory (PMEL). Ad-
ditional support was provided by NOAA OAR (Oceanic
and Atmospheric Research) ‘Omics funding to the Ge-
netics and Genomics Group led by C.A.S. M.R.S. was
partially supported by a graduate student fellowship from
the University of Toledo (2016-2019), conducted under
the advisement of C.A.S., and by JISAO (2019).
References
Alberdi A, Aizpurua O, Gilbert MTP, Bohmann K (2017) Scrutiniz-
ing key steps for reliable metabarcoding of environmental sam-
ples. Methods in Ecology and Evolution 9: 134-147. https://doi.
org/10.1111/2041-210X.12849
Allendorf FW, Lundquist LL (2003) Introduction: Population Biology,
Evolution, and Control of Invasive Species. Conservation Biology
17: 24-30. https://doi.org/10.1046/).1523-1739.2003.02365.x
Attrill MJ, Depledge MH (1997) Community and population indica-
tors of ecosystem health: targeting links between levels of bio-
logical organization. Aquatic Toxicology 38:183—197. https://doi.
org/10.1016/S0166-445X(96)00839-9
Balmford A, Gaston KJ (1999) Why biodiversity surveys are good val-
ue. Nature 398: 204—205. https://do1.org/10.1038/18339
Barnes MA, Turner CR (2016) The ecology of environmental DNA and
implications for conservation genetics. Conservation Genetics 17:
1-17. https://doi.org/10.1007/s10592-015-0775-4
Barnes MA, Turner CR, Jerde CL, Renshaw MA, Chadderton WL,
Lodge DM (2014) Environmental conditions influence eDNA per-
sistence in aquatic systems. Environmental Science and Technology
48: 1819-1827. https://doi.org/10.1021/es404734p
Begon M, Townsend CR, Harper JL (2006) Ecology: From Individuals
to Ecosystems, Fourth Edition. Wiley-Blackwell, Malden.
Bessey C, Jarman SN, Berry O, Olsen YS, Bunce M, Simpson T, Power
M, McLaughlin J, Edgar GJ, Keesing J (2020) Maximizing fish de-
tection with eDNA metabarcoding. Environmental DNA 00: 1-12.
https://doi.org/10.1002/edn3.74
Bylemans J, Gleeson DM, Lintermans M, Hardy CM, Beitzel M, Gil-
ligan DM, Furlan EM (2018) Monitoring riverine fish communi-
ties through eDNA metabarcoding: determining optimal sampling
strategies along an altitudinal and biodiversity gradient. Metabar-
https://mbmg.pensoft.net
coding and Metagenomics 2: e30457. https://doi.org/10.3897/
mbmg.2.30457
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes
SP (2016) DADA2: high-resolution sample inference from Illu-
mina amplicon data. Nature Methods 13: 581-583. https://doi.
org/10.1038/nmeth.3869
Cantera I, Cilleros K, Valentini A, Cerdan A, Dejean T, Iribar A, Taber-
let P, Vigouroux R, Brosse S (2019) Optimizing environmental DNA
sampling effort for fish inventories in tropical streams and rivers.
Scientific Reports 9: 3085. https://doi.org/10.1038/s41598-019-
39399-5
Cilleros K, Valentini A, Allard L, Dejean T, Etienne R, Grenouillet G,
Iribar A, Taberlet P, Vigouroux R, Brosse S (2019) Unlocking bio-
diversity and conservation studies in high-diversity environments
using environmental DNA (eDNA): a test with Guianese fresh-
water fishes. Molecular Ecology Resources 19: 27-46. https://doi.
org/10.1111/1755-0998. 12900
Civade R, Dejean T, Valentini A, Roset N, Raymond J-C, Bonin A,
Taberlet P, Pont D (2016) Spatial representativeness of environmen-
tal DNA metabarcoding signal for fish biodiversity assessment in
a natural freshwater system. PLoS ONE 11: e0157366. https://doi.
org/10.1371/journal.pone.0157366
Darling JA, Mahon AR (2011) From molecules to management: Adopt-
ing DNA-based methods for monitoring biological invasions in
aquatic environments. Environmental Research 111: 978-988.
https://doi.org/10.1016/j.envres.2011.02.001
Deiner K, Altermatt F (2014) Transport distance of invertebrate envi-
ronmental DNA in a natural river. PLoS ONE 9: e88786. https://doi.
org/10.1371/journal.pone.0088786
Deiner K, Bik HM, Machler E, Seymour M, Lacoursiere-Roussel A, Al-
termatt F, Creer S, Bista I, Lodge DM, Vere N de, Pfrender ME, Ber-
natchez L (2017) Environmental DNA metabarcoding: transforming
how we survey animal and plant communities. ish Aquat Molecular
Ecology 26: 5872-5895. https://doi.org/10.1111/mec.14350
Dobson AP, Rodriguez JP, Roberts WM, Wilcove DS (1997) Geograph-
ic distribution of endangered species in the United States. Science
275: 550-553. https://doi.org/10.1126/science.275.5299.550
Elton CS (1966) The Pattern of Animal Communities. Springer Nether-
lands, Rotterdam, The Netherlands.
Evans NT, Li Y, Renshaw MA, Olds BP, Deiner K, Turner CR, Jerde CL,
Lodge DM, Lamberti GA, Pfrender ME (2017) Fish community as-
sessment with eDNA metabarcoding: effects of sampling design and
bioinformatic filtering. Canadian Journal of Fisheries and Aquatic
Sciences 74: 1362-1374. https://doi.org/10.1139/cjfas-2016-0306
Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of
programs to perform population genetics analyses under Linux and
Windows. Molecular Ecology Resources 10: 564-567. https://doi.
org/10.1111/j.1755-0998 .2010.02847.x
Fadrosh DW, Ma B, Gajer P, Sengamalay N, Ott S, Brotman RM, Ravel
J (2014) An improved dual-indexing approach for multiplexed 16S
tRNA gene sequencing on the Illumina MiSeq platform. Microbi-
ome 2: 1-6. https://doi.org/10.1186/2049-261 8-2-6
Fremier AK, Strickler KM, Parzych J, Powers S, Goldberg CS (2019)
Stream transport and retention of environmental DNA pulse releases
in relation to hydrogeomorphic scaling factors. Environmental Sci-
ence and Technology. https://doi.org/10.1021/acs.est.8b06829
Fujii K, Doi H, Matsuoka S, Nagano M, Sato H, Yamanaka H (2019)
Environmental DNA metabarcoding for fish community analysis in
Metabarcoding and Metagenomics 4: e53455
backwater lakes: a comparison of capture methods. PLoS ONE 14:
e0210357. https://doi.org/10.1371/journal.pone.0210357
Gillet B, Cottet M, Destanque T, Kue K, Descloux S, Chanudet V,
Hughes S (2018) Direct fishing and eDNA metabarcoding for bio-
monitoring during a 3-year survey significantly improves number of
fish detected around a South East Asian reservoir. PLoS ONE 13:
e0208592. https://doi.org/10.1371/journal.pone.0208592
Hanfling B, Handley LL, Read DS, Hahn C, Li J, Nichols P, Blackman
RC, Oliver A, Winfield IJ (2016) Environmental DNA metabarcod-
ing of lake fish communities reflects long-term data from established
survey methods. Molecular Ecology 25: 3101-3119. https://doi.
org/10.1111/mec.13660
Hubbs CL, Lagler KF (2007) Fishes of the Great Lakes Region, Revised
Edition. University of Michigan Press, Ann Arbor, Michigan.
Jo T, Murakami H, Yamamoto S, Masuda R, Minamoto T (2019) Effect
of water temperature and fish biomass on environmental DNA shed-
ding, degradation, and size distribution. Ecology and Evolution 9:
1135-1146. https://doi.org/10.1002/ece3.4802
Kelly RP, Closek CJ, O’Donnell JL, Kralj JE, Shelton AO, Samhouri
JF (2017) Genetic and manual survey methods yield different and
complementary views of an ecosystem. Frontiers in Marine Science
3: 1-11. https://doi.org/10.3389/fmars.2016.00283
Khodakov D, Wang C, Zhang DY (2016) Diagnostics based on nucleic
acid sequence variant profiling: PCR, hybridization, and NGS ap-
proaches. Advances in Drug Delivery Reviews 105: 3-19. https://
doi.org/10.1016/j.addr.2016.04.005
Klymus KE, Marshall NT, Stepien CA (2017) Environmental DNA
(eDNA) metabarcoding assays to detect invasive invertebrate
species in the Great Lakes. PLoS ONE 12: e0177643. https://doi.
org/10.1371/journal.pone.0177643
Kolar C, Chapman D, Courtenay W, Housel C, Williams J, Jennings D
(2005) Asian carps of the genus Hypophthalmichthys (Pisces, Cy-
prinidae) — a biological synopsis and environmental risk assess-
ment. National Invasive Species Council Materials. http://digital-
commons.unl.edu/natlinvasive/5
Lawson Handley L, Read DS, Winfield IJ, Kimbell H, Johnson H, Li J,
Hahn C, Blackman R, Wilcox R, Donnelly R, Szitenberg A, Han-
fling B (2019) Temporal and spatial variation in distribution of fish
environmental DNA in England’s largest lake. Environmental DNA
1: 26-39. https://doi.org/10.1002/edn3.5
MacConaill LE, Burns RT, Nag A, Coleman HA, Slevin MK, Giorda
K, Light M, Lai K, Jarosz M, McNeill MS, Ducar MD, Meyerson
M, Thorner AR (2018) Unique, dual-indexed sequencing adapters
with UMIs effectively eliminate index cross-talk and significantly
improve sensitivity of massively parallel sequencing. BMC Genom-
ics 19: 1-30. https://doi.org/10.1186/s12864-017-4428-5
Margules CR, Pressey RL, Williams PH (2002) Representing biodiver-
sity: Data and procedures for identifying priority areas for conser-
vation. Journal of Biosciences 27: 309-326. https://doi.org/10.1007/
BF02704962
Marshall NT, Stepien CA (2019) Invasion genetics from eDNA and
thousands of larvae: A targeted metabarcoding assay that distin-
guishes species and population variation of zebra and quagga
mussels. Ecology and Evolution 9: 1—24. https://doi.org/10.1002/
ece3.4985
Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T,
Yamamoto S, Yamanaka H, Araki H, Kondoh M, Iwasaki W (2015)
MiFish, a set of universal PCR primers for metabarcoding environ-
63
mental DNA from fishes: detection of more than 230 subtropical
marine species. Royal Society Open Science 2: 150088. https://doi.
org/10.1098/rsos. 150088
Morin PJ (2009) Community Ecology. John Wiley & Sons, Oxford.
Miullner D (2013) Fastcluster: fast hierarchical, agglomerative cluster-
ing routines for R and Python. Journal of Statistical Software 53:
1-18. https://doi.org/10.18637/jss.v053.109
Myers N, Mittermeier RA, Mittermeier CG, Fonseca GAB da, Kent J
(2000) Biodiversity hotspots for conservation priorities. Nature 403:
1-853. https://do1.org/10.1038/35002501
Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y,
Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M,
Ogasawara N, Kanaya S (2011) Sequence-specific error profile of
Illumina sequencers. Nucleic Acids Res 39: e90—e90. https://doi.
org/10.1093/nar/gkr344
NOAA (2019) GLANSIS — Great Lakes Aquatic Nonindigenous Spe-
cies Information System. https://www.glerl.noaa.gov/glansis/ [Ac-
cessed 31 July 2020]
OEPA (2014) Biological and Water Quality Study of the Maumee
River and Auglaize River 2012—2013. Ohio EPA Division of Sur-
face Water. https://www.epa.ohio.gov/Portals/35/documents/Mau-
meeTSD_2014.pdf
OEPA (2015) Biological Criteria for the Protection of Aquatic Life:
Volume III. Standardized Biological Field Sampling and Laboratory
Methods for Assessing Fish and Macroinvertebrate Communities.
In: Ohio EPA Division of Surface Water. https://epa.ohio.gov/por-
tals/35/documents/BioCrit15_Vol3.pdf#page41
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D,
Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH,
Szoecs E, Wagner H (2019) Package ‘vegan.’ CRAN. https://cran.
ism.ac.jp/web/packages/vegan [Accessed 31 July 2020]
Parsons KM, Everett M, Dahlheim M, Park L (2018) Water, water ev-
erywhere: environmental DNA can unlock population structure in
elusive marine species. Royal Society of Open Science 5: 180537.
https://doi.org/10.1098/rsos. 180537
Pont D, Rocle M, Valentini A, Civade R, Jean P, Maire A, Roset N,
Schabuss M, Zornig H, Dejean T (2018) Environmental DNA re-
veals quantitative patterns of fish biodiversity in large rivers despite
its downstream transportation. Scientific Reports 8: 10361. https://
doi.org/10.1038/s41598-018-28424-8
Port JA, O’ Donnell JL, Romero-Maraccini OC, Leary PR, Litvin SY,
Nickols KJ, Yamahara KM, Kelly RP (2016) Assessing vertebrate
biodiversity in a kelp forest ecosystem using environmental DNA.
Molecular Ecology 25: 527-541. https://doi.org/10.1111/mec.13481
R Core Team (2018) R: a language and environment for statistical com-
puting. R Foundation for Statistical Computing, Vienna.
Rice WR (1989) Analyzing Tables of Statistical Tests. Evolution 43:
223-225. https://doi.org/10.2307/2409177
Robin ED, Wong R (1988) Mitochondrial DNA molecules and virtu-
al number of mitochondria per cell in mammalian cells. Journal
of Cellular Physiology 136: 507-513. https://doi.org/10.1002/
jcp.1041360316
Schirmer M, D’Amore R, Ijaz UZ, Hall N, Quince C (2016) Illumina er-
ror profiles: resolving fine-scale variation in metagenomic sequenc-
ing data. BMC Bioinformatics 17: 1-125. https://doi.org/10.1186/
$12859-016-0976-y
Shaw JLA, Clarke LJ, Wedderburn SD, Barnes TC, Weyrich LS,
Cooper A (2016) Comparison of environmental DNA metabar-
https://mbmg.pensoft.net
64 Matthew R. Snyder & Carol A. Stepien: Metabarcode assay confidence
coding and conventional fish survey methods in a river system.
Biological Conservation 197: 131-138. https://doi.org/10.1016/).
biocon.2016.03.010
Sigsgaard EE, Nielsen IB, Bach SS, Lorenzen ED, Robinson DP, Knud-
sen SW, Pedersen MW, Jaidah MA, Orlando L, Willerslev E, Moller
PR, Thomsen PF (2017) Population characteristics of a large whale
shark aggregation inferred from seawater environmental DNA. Na-
ture Ecology & Evolution 1: 0004. https://doi.org/10.1038/s41559-
016-0004
Simon T (2006) Biodiversity of fishes in the Wabash River: Status, indi-
cators, and threats. Proceedings of the Indiana Academy of Science
115: 136-148.
Smart AS, Weeks AR, Rooyen AR van, Moore A, McCarthy MA, Tin-
gley R (2016) Assessing the cost-efficiency of environmental DNA
sampling. Methods in Ecology and Evolution 7: 1291-1298. https://
doi.org/10.1111/2041-210X.12598
Snyder MR, Stepien CA (2017) Genetic patterns across an invasion’s
history: a test of change versus stasis for the Eurasian round goby
in North America. Molecular Ecology 26: 1075-1090. https://doi.
org/10.1111/mec.13997
Snyder MR, Stepien CA, Marshall NT, Scheppler HB, Black CL, Cza-
jkowski KP (2020) Detecting aquatic invasive species in bait and
pond stores with targeted environmental (e) DNA high-through-
put sequencing metabarcode assays: angler, retailer, and manager
implications. Biological Conservation 243: 108430. https://doi.
org/10.1016/j.biocon.2020.108430
Stepien CA, Snyder MR, Elz AE (2019) Invasion genetics of the sil-
ver carp Hypophthalmichthys molitrix across North America: Dif-
ferentiation of fronts, introgression, and eDNA metabarcode de-
tection. PLoS ONE 14: e0203012. https://doi.org/10.1371/journal.
pone.0203012
Stoeckle MY, Soboleva L, Charlop-Powers Z (2017) Aquatic envi-
ronmental DNA detects seasonal fish abundance and habitat pref-
erence in an urban estuary. PLoS ONE 12: e0175186. https://doi.
org/10.1371/journal.pone.0175186
Suzuki R, Shimodaira H (2015) Package “pvclust”: Hierarchical Clus-
tering with P-Values via Multiscale Bootstrap Resampling. CRAN.
https://cran.r-project.org/web/packages/pvclust/pvclust.pdf § [Ac-
cessed 31 July 2020]
Thomsen PF, Moller PR, Sigsgaard EE, Knudsen SW, Jorgensen OA,
Willerslev E (2016) Environmental DNA from seawater samples
correlate with trawl catches of subarctic, deepwater fishes. PLoS
ONE 11: e0165252. https://doi.org/10.1371/journal.pone.0165252
Tsuji S, Miya M, Ushio M, Sato H, Minamoto T, Yamanaka H (2018)
Evaluating intraspecific diversity of a fish population using environ-
mental DNA: An approach to distinguish true haplotypes from erro-
neous sequences. bioRxiv 429993. https://do1.org/10.1101/429993
Turner CR, Barnes MA, Xu CCY, Jones SE, Jerde CL, Lodge DM
(2014) Particle size distribution and optimal capture of aqueous
macrobial eDNA. Methods in Ecology and Evolution 5: 676-684.
https://doi.org/10.1111/2041-210X.12206
Valentini A, Taberlet P, Miaud C, Civade R, Herder J, Thomsen PF, Bel-
lemain E, Besnard A, Coissac E, Boyer F, Gaboriaud C, Jean P, Poulet
N, Roset N, Copp GH, Geniez P, Pont D, Argillier C, Baudoin J-M,
Peroux T, Crivelli AJ, Olivier A, Acqueberge M, Brun ML, Moller
https://mbmg.pensoft.net
PR, Willerslev E, Dejean T (2016) Next-generation monitoring of
aquatic biodiversity using environmental DNA metabarcoding. Mo-
lecular Ecology 25: 929-942. https://doi.org/10.1111/mec.13428
van Bleijswijk JDL, Engelmann JC, Klunder L, Witte HJ, Witte JI, Veer
HW van der (2020) Analysis of a coastal North Sea fish commu-
nity: Comparison of aquatic environmental DNA concentrations to
fish catches. Environmental DNA 00: 1-17. https://do1.org/10.1002/
edn3.67
van Bochove K, Bakker FT, Beentjes KK, Hemerik L, Vos RA, Grav-
endeel B (2020) Organic matter reduces the amount of detectable
environmental DNA in freshwater. Ecology and Evolution 10:
3647-3654. https://doi.org/10.1002/ece3.6123
West KM, Stat M, Harvey ES, Skepper CL, DiBattista JD, Richards ZT,
Travers MJ, Newman SJ, Bunce M (2020) eDNA metabarcoding
survey reveals fine-scale coral reef community variation across a re-
mote, tropical island ecosystem. Molecular Ecology 29: 1069-1086.
https://doi.org/10.1111/mec. 15382
Williams KE, Huyvaert KP, Piaggio AJ (2017) Clearing muddied wa-
ters: Capture of environmental DNA from turbid waters. PLoS ONE
12: e0179282. https://do1.org/10.1371/journal.pone.0179282
Wu L, Wen C, Qin Y, Yin H, Tu Q, Nostrand JDV, Yuan T, Yuan M,
Deng Y, Zhou J (2015) Phasing amplicon sequencing on Illumina
Miseq for robust environmental microbial community analysis.
BMC Microbiology 15: 1-125. https://doi.org/10.1186/s12866-015-
0450-4
Xiong W, Li H, Zhan A (2016) Early detection of invasive species in
marine ecosystems using high-throughput sequencing: technical
challenges and possible solutions. Marine Biology 163: 1-139.
https://doi.org/10.1007/s00227-016-2911-1
Zaiko A, Pochon X, Garcia-Vazquez E, Olenin S, Wood SA (2018)
Advantages and limitations of environmental DNA/RNA tools for
marine biosecurity: management and surveillance of non-indig-
enous species. Frontiers in Marine Science 5: 1-17. https://doi.
org/10.3389/fmars.2018.00322
Zar JH (2010) Biostatistical Analysis, Fifth Edition. Pearson, Upper
Saddle River, NJ.
Zinger L, Bonin A, Alsos IG, Balint M, Bik H, Boyer F, Chariton AA,
Creer S, Coissac E, Deagle BE, Barba MD, Dickie IA, Dumbrell AJ,
Ficetola GF, Fierer N, Fumagalli L, Gilbert MTP, Jarman S, Jump-
ponen A, Kauserud H, Orlando L, Pansu J, Pawlowski J, Tedersoo
L, Thomsen PF, Willerslev E, Taberlet P (2019) DNA metabarcod-
ing-Need for robust experimental designs to draw sound ecologi-
cal conclusions. Molecular Ecology 28: 1857-1862. https://doi.
org/10.1111/mec.15060
Supplementary material 1
Additional methods, results, figures, and tables
Authors: Matthew R. Snyder, Carol A. Stepien
Data type: Additional methods, results, figures, tables
Copyright notice: This dataset is made available under the
Open Database License (http://opendatacommons.org/licens-
es/odbl/1.0/). The Open Database License (ODbL) is a license
agreement intended to allow users to freely share, modify, and
use this Dataset while maintaining this same freedom for oth-
ers, provided that the original source and author(s) are credited.
Link: https://doi.org/10.3897/mbmg.4.53455.suppl1